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BSTIMATIMG  ns  INTENSITY  OF  A POISSON  PROCESS 

G.  S.  Watson 
?rinomton  Vrtivtrtity 

1.  Introduction. 

Thara  is  an  inmansa  and  increasing  literature  on  point 
processes,  especially  Poisson  processes  in  which  the  in- 
tensity is  a fiinction  X(t)  of  a scalar  parameter  t,  usually 
time.  Summaries  are  given  in  Cox  and  Lewis  (1966) , Lewis 
(1972),  1975).  There  are  many  motivations.  Lewis  has  often 
been  concerned  with  times  of  breakdown  of  computers.  My  own 
interest  was  aroused  by  two  problems — the  effects  of  the 
environment  on  the  deaths  per  day  in  a large  city  and  a 
discussion  of  wildcat  oil  strikes  in  Alberta.  The  latter  was 
merely  a nusverical  example  in  a paper  by  Clevenson  and  Zidek 
(1975)  but  it  suggested  that  this  topic  would  be  appropriate 
for  this  Symposium.  However,  our  discussion  will  be 
theoretical— we  will  speak  of  events,  not  deaths  or  strikes— 
and  its  aim  is  to  show  the  simplicity  and  unity  of  a number 
of  statistical  and  mathematical  methods. 

If  the  chances  that  an  event  occurs  or  does  not  occur  in 
a smaller  Interval  (t,t4-dt)  are  given  by  X(t)dt,  and 
1 - X(t)dt,  respectively,  independently  of  what  happened 
before  t,  then  the  number  of  events  N(t)  in  (0,t)  is  said 
to  follow  a non- homogeneous  Poisson  process  (N.H.P.P.).  The 
numbers  of  events  in  non-overlapping  intervals  are  independent 
and  have  Poisson  distributions  whose  means  are  the  integrals 
cf  X(t)  over  the  relevant  time  intervals.  Classical  examples 
are  deaths  due  to  horse  kicks  in  the  Prussian  Army  and  clicks 
of  a Geiger  Counter. 

If  the  times  of  events  are  marked  on  a time  axis,  they 
will  be  densest,  on  the  average,  where  X(t)  is  greatest. 

In  this  respect,  X(t)  is  like  a probability  density.  One 
can  show  that  If  n events  occur  in  (0,T) , then  the  times  of 


n c 


the  eventa  uy  be  treated  as  a randoe  saaple  free  a 
probability  distribotion  with  density 

T 

£(t)  - X(t)//  X(t)  dt  . (1.1) 

0 


Given  a set  of  data  oonaiating  of  the  successive  tines  of 
events  from  an  N.H.P.P. , the  statistician  will  wish  to  esti- 
sate  the  function  X(t)  and  perhaps  test  sons  hypotheses 
about  it.  In  sone  circunstances,  the  fom  of  X(t)  icay  be 
given  and  the  problem  reduced  to  estinating  some  parameters 
in  the  formula  for  X (t) . 

Given  a random  eanple  from  some  distribution,  precisely 
the  same  problems  arise— estimating  its  density  or  parameters 
in  a formula  for  it. 

In  both  problems,  the  data  are  often  grouped  to  begin 
with  or  later  for  convenience— deaths  per  day,  strikes  per 
month.  The  mathematicnl  difficulties  of  estimating  X(t)  are 
greatly  reduced  by  dividing  the  time  axis  into  intervals  of 
equal  length  6.  We  will  start  with  such  data.  Clevenson 
and  Zldek  discussed  very  similar  topics  in  continuous  tisie 
and  were  essentially  forced  by  difficulties  to  suJee  discrete 
approximations  later. 

The  problem  of  estimating  a function  X(t},  0 < t < T or 
the  fxuiction  X^,  t <■  0,1,2  N-1  is  clearly  quite  distinct, 
given  a finite  amount  of  data,  fr<XB  the  more  usual  problem  of 
estimating  a few  parameters.  We  will  use  the  total  mean 
square  error  to  assess  the  quality  of  the  estimator,  i.e. , 
we  will  want  an  estimator  that  makes 


E 


N-1 

I 


(1.2) 


small.  We  will  further  consider  e measure  of  amoothneae  of 
the  sequence  X^  and  require  it  to  be  small  too,  on  the 
grounds  that  the  true  sequence  X^  is  not  rough,  nasMly 

M-1  2 

I j 


(1.3) 
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vher«  i>  the  p-th  diff«r«nc«  of  th*  sequenco. 

By  coobining  tboso  two  noasuros  of  variability,  wa  will 
have  aoaething  to  optinize.  It  will  turn  out  that  the  "beat* 
estiraator  dependa  upon  the  luknoira  sequence  ao  that,  in 
practice,  one  needs  to  put  in  one's  intuition.  While  this 
is  faaillar  in  tiaw  sories  analysis,  it  cones  as  surprise 
to  some  people  since  such  results  ace  concealed  in  elonentary 
statistics  by  assusdng  a parametric  form.  More  familiar  is 
our  restriction  to  a class  of  estimators  linear  in  the 
counts  of  events. 

Sometimes  it  is  realistic  to  assume  that  l(t)  is  itself, 
a random  function.  Thus,  we  will  have  two  cases~-fixed  and 
random  functions.  Since  X{t)  is  often  the  signal  that  some* 
thing  is  present,  it  is  often  referred  to  as  the  signal, 
rather  than  as  the  function. 

Of  course,  for  the  statistical  analyses  of  particular 
point  processes,  the  N.H.P.P.  model  may  not  be  appropriate. 

It  is  beyond  our  sc<^>e  to  consider  the  general  problem.  Ho%r-> 
ever,  in  Section  6,  we  briefly  mention  another  model.  In 
Section  7,  some  ideas  for  simpler  and  more  exploratory 
analyses  are  suggested. 

2 . A Time  Series  Approach  to  the  Fixed  Signal  Case. 

Suppose  we  have  the  number  of  events  in  N successive 
Intervals  and  that  these  are  denoted  by  n^,  n^,  ...,  with 

mean  values  Xq,  Xj^,  ...,  which  we  may  think  of  first  as 

constants— a "fixed  signal",  since  they  are  integrals  of 
X(t)  over  intervals  of  length  6,  if  X(t}  is  a smooth  func- 
tion, the  X^'s  will  behave  regularly,  and  be  smoother  than 
the  n^'s.  Therefore,  it  is  reasonable  to  consider  a moving 
average  estimator  of  X^  given  by 

..  K- 

*t  * ^ *k**t-k  * (2.1) 


To  avoid  negative  estimates,  the  Sj^  would  usually  be  taken 
as  non-negative.  Since  a constant  record  ought  to  be  equal 
to  its  estimate,  one  usually  supposes  Csj^  ■ 1.  Then  the 


aj^*s  font  a discrete  probability  distribution.  In  the 
absence  of  other  ideas»  it  would  be  syno&etric  around  sero» 
its  support*  determined  by  K*  would  be  large  if  1^  is  a very 
smooth  function*  small  if  it  is  not.  It  is  fairly  standard 

A 

to  call  a filtered  version  of  the  n^  series*  and  the  set 
of  a filter.  It  is  clear  that  smoothing  reduces  the 

variance  but  increases  the  bias. 

To  find  an  optimal  filter,  one  might  ask  for  the  filter 
which  odnimlzes  the  average  mean  square  error 


* I a. 


(2.2) 


where*  for  this  section  we  regard  the  X^*s  as  constants  and 
we  make  no  assvusptions  about  the  a^*s.  However*  one  cannot 
calculate  X^  for  t's  such  that  (2.1)  calls  for  n^'s  with  t 
outside  0*  ...,  N-1.  To  overcome  this,  we  put  the  n^  and  X^ 
equally  spaced  around  the  circumference  of  a circle.  It  is 
then  natural  to  identify  n^j  with  n^,  with  n^^  etc.*  and 

n_^  with  etc.  The  functions  n^,  X^,  X^  then  are  de- 

fined for  all  integers  and  they  have  period  N.  If  we  define 


'j 


— 

N 


* j«0,l,. . . ,N-1 


(2.3) 


exp  iWjt  also  has  period  N.  Multiplying  (2.1)  by  exp  iw^t 
and  adding  over  t from  0 to  N-1*  we  find 


X(«j)  - A((iij)n(Wj) 


(2.4) 


where 


. H-1  ^ 

X(Wj)  " ^ iWjt, 


A(w 


exp  iw^k 


(2.5) 


M-l 


n(w 


i”  J ■'< 


exp  Iwjt* 


where  we  make  no  aasua^ttiona  about  Xt  Since 


N-1 

I exp  Iw^t 
t-0  J 


if 

if 


j-0 


we  can  verify  that 


(2.6) 


and 


- 1 - 

X*.  • - I X(Wj)  exp  -iMjt 
^ N 0 J 


1 

N 


M-1  ^ 

I U(Wj)l 

0 J 


2 


(2.7) 


(2.8) 


with  similar  formulae  for  any  other  function  of  period  N. 
Thus 


N-1 

I 


0 


1 N-1  ^ 2 

- I |x(M.)  - x(«.)r 

N 0 ^ J 


1 N-1  , 

- - I |A(»u)n(«.)  - X(w.)r  • (2-9) 

N 0 J J ^ 


Hence,  the  criterion  (2.2)  may  be  rewritten:  find  k{w)  so 
that 


M-1  , 

B ^ |A(Uj)n((a^)  - X(Wj)i  > min  I (2.10) 

The  j-th  term' in  the  left-hand  side  (l.h.s.)  of  (2.10)  is 
|A(«^) |*E|n(Wj) I*  - A(w^)En(w^)X(Wj) 

- X(M^)En(»j)X(«j)  ♦ |X(»^)|*  . (2.11) 

Now 


N-1 

Ea(«^)  ■ T K(nj)  exp  iw^t  « X(w^)  , 


(2.12) 


6. 


■II  iMj(t-t*) 


i »t  + IxUj)!*  , 


(2.i:i) 


since  the  n.  are  independent  for  distinct  t.  Bn.  ■ and 

2^2  ^ 

E n^  • 'f  . Now  (2.11)  may  be  rewritten  as 


- En(«.)X(Wj) 

B|n(u^)P  A(uj) -3-. 

J J E|n(»^)| 

+ iM..)r  — r*— -ri — 

^ E|n((dj)P 


(2.14) 


so  that  every  tern  in  the  sum  (2.10)  is  minimized  if  we 
choose 


A(w^) 


|X(ej)r 

N-1 

I X + |X(w.)|' 
0 ^ 


(2.15) 


giving  (2.10)  a minimum  value  of 


N>1 

H-1  J *t 

I 2 lX(m)|- 

j«0  N-1  , J 

j X ♦ ix(ttj)r 


(2.16) 


The  optimal  filter*  defined  by  (2.15)*  has  the  following 
properties.  He  note  that  0 ^ A(Uj)  £ 1*  all  j.  Since 
A(Wj)  « A(u.j)  is  real*  a^^  ■ a_j^  (i.e.*  the  filter  is 
symmetric  abwt  sero)  t 
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1 N-1 

aj^  ■ - ^ A(w^)  exp  “ikuy 

1 

- — J A(w_a)  exp  -IJtWj, 


- I A(w_^)  exp  -i(-)c)w_., 
H J J 


" ‘-k  • 

Since  • A(0),  (2.15)  gives  (JX^  ■ X(0)  > 0) 

X(0)2 

" J ^ 

* X(0)  + X(O)'^ 

so  that  for  all  filters  considered  here,  there  is  a downwards 
bias.  Finally,  since  A(w^)  « A(w_j)  « it  will  be 

convenient  to  arrange  that  N ■ 2K  + 1 or  K ■ (N~l)/2. 

If  ■ X,  t - 0,...,N>1,  then  X(Wj)  *»  0 unless  j ■ 0 
when  it  is 

N-1 

I X - NX 
/\  ^ 


so  that 


A(0)  ■ -j  , A(w.)  • 0,  j ■ l,...,N-'l. 

NX  ♦ (NX)^  J 

Hence,  the  aj^  sequence  is  constant,  a^  say,  and  equal  to 
A(0)  /H.  Thus,  N a^  i 1 as  NX.-»«>  so  that  the  downward  bias 
disappears. 

By  contrast,  if  ■ X for  all  t except  t^  when  X^  ■ X-ff 
with  6 very  large,  then  ^ 


8 


!ax  •f  8 if  3»o  » 

(2.17) 

8 .«xp  i«jtQ  If  jf^O  , 


•o  that 


(HX  f 8) 


A(«»j)  - 


(NX  >!>  8)  •»-  (NX  * 8) 
*2 


(NX  + 8)  + 8" 


If  8 > NX,  A(u^)  - 1 for  all  j and 

= r ^ ® ' 

^ (O  , k f<  0 . 


j - 0 

j I*  0 . 


Given  the  signal  in  the  process,  this  is  very  reasonable—one 
would  not  want  to  smooth  at  all  since  only  one  X^  is  out  of 
line.  It  is  important  to  note  that  the  optimal  filter  is 
unaffected  by  tQ.  It  will  pick  up  this  signal  just  as  well 
wherever  it  is,  i.e.,  the  filter  is  time-invariant  with 
respect  to  the  position  of  the  signal. 

This  is  true  in  general  since  moving  the  signal  on  tQ 
time  units  is  the  sane  hero  as  multiplying  KUj)  by  exp  ‘■ietg 
and  A(ttj)  is  unaffected  since  it  depends  only  on  the  modulus 
of  X(e^). 

Finally,  the  computation  of  the  time  domain  filter  coef- 
ficients given  A((i>j)  can  be  done  by  using  the  Fast  Fourier 
Transform  to  calculate 

1 N-1 

a^  ■ — I A((i>j)  exp  -itt^k  . (2.18) 

* N j-0  ^ J 

Equally,  given  some  idea  of  X^,  it  can  be  used  to  calculate 
the  X(Uj)  which  will  be  needed  to  obtain  the  approximate 
A(Uj).  Naturally,  in  practice,  the  ideal  filter  cannot  be 
used  for  the  estimation  of  the  quantity  it  depends  on.  In 
recurring  problems,  or  in  the  testing  or  detection  problems, 
discussed  in  Section  4,  this  is  possible. 


9. 


The  circular  device  used  above  Is  an  often  exploited 
mathenatical  trick  to  make  the  mathematics  easy.  It  will 
only  lead  to  trouble  if  the  do  not  fall  off  to  zero 

rapidly  as  |k|  increases.  The  two  examples  show  when  we  may 
expect  it  to  work. 

3.  The  Case  of  a Random  X(t). 

It  was  suggested  in  Section  1 that  in  many  problems  A{t) 
should  be  taken  as  a random  function.  In  this  case  to  the 
expectation  in  (2.2)  and  so  (2.10)  should  be  added  another 
expectation  ^ over  all  X (t) . The  analysis  is  otherwise  the 
same.  Looking  at  (2.11),  (2.12),  and  (2.13),  we  need  to 
evaluate 

E JX{w.)  and  E J . 

X J X ' 

If  X(t)  is  a second-order  stationary  process  in  con- 
tinuous time,  then  X^  will  be  second-order  stationary  in 
discrete  time.  As  we  cannot  with  grouped  data  do  more  than 
study  the  X^  process,  we  will  only  make  assumptions  about  it. 
These  are  the  following t 

E X^  ■ X 

Cov(X^.,  X^^^)  ■ p(T),  XIp(t)|  < - 

1 • 

f(^)  . — I p(t)  exp  ix^, 

2v  -• 


X - X - e^^^  dZ(«), 

^ -s 

p(T)  - e^*^  f(^)d^, 

-w 


10. 


E dZ(^)dZ(#M  - 0 , 

X 

B |dZ(^) - f(«)d*  . 
X 


These  are  the  basic  relations  and  quantities  for  such  a 
process.  f(^}  is  called  the  (power)  spectral  density  of  the 
process.  Readers  unacquainted  with  this  theory  need  only 
regeurd  the  integral  formula  for  X^  - X as  an  expression  for 
this  deviation  from  zero  as  a sum  of  sinusoidal  oscillations 
with  uncorrelated  as^litudes  whose  average  magnitudes  vary 
with  the  frequency  as  described  by  f(^),  as  seen  in  the  last 
three  expectations.  The  fact  that  the  Fourier  series  for 
f(^)  has  coefficients  proportional  to  the  covariances  p(T) 


will  not  bs 

Important  here 

Then 

N-1 

X(«j)  - 

1 X exp  iu. 
0 ^ J 

N-1  IT  N-1 

« X I exp  Lui.t  + / I exp (i (♦+(»). )t)dZ(^)  . 
0 •'  “W  0 ■' 


So 

w 1 - ei»»^ 

X(0)  - NX  + / vy-  dZ(^), 

“X  1 - e ’ 

w 1 - ei*"**”!* 

»'“j>  * I - — TTfsirr  ' i >'  0 ■ 

•'  -w  1 - e 3' 

Hence 


E X (0)  - E T X*.  » NX 
X ' 

E X(«^)  - 0 , j 0 
X J 


I 

* — 


2 


E |X(0)|^ 
X 


f(«)d«. 


E |x(ci.)  r = / 

.X  J -If 


1 


1 


^iN{^+u)j) 

(4>+Wj) 


f {^)d4'. 


(3.2) 


(3.3) 


The  multiplier  of  f(4')  when  divided  by  2tiN  in  these  last  two 
integrals  is  called  the  Fejer  kernel,  i.e., 


g(«) 


1 1 
2irN  1 


iN^  IN^ 


1 

2irN 


sin 


sin 


N<t> 

2 

4> 

2 


2 


(3.4) 


It  may  be  shown  that  9(0)  has  an  integral  from  -n  to  ir  of 
unity  and  that  it  resembles  more  and  more  as  N-^  a Dirac 
delta  function  with  singularity  at  ^ » 0.  The  same  is  true 
of  g(^+u»j)  except  that  the  peak  is  mcved  to  -Wj. 

Thus,  for  large  N,  (3.2)  and  (3.3)  become  approximately 

E|X{0)|^  = N^X^  + 2nH  f(0)  - N^X^ 

E|X(<Dj)  : 2irN  f (u^) 

so  that  the  analogue  of  (2.14)  here  reads 


(3.5) 


12 


and 


(NX  27rN 


A(«^)  - 


2yN  f(Mj) 


NX  + 2irN  f(w^) 


+ 2irN  f(o^) 


2kN  f(w^) 

NX  2irN  f((i>j) 


80  that  for  larga  N 


A(0)  : 1 , 


2W  f(Wa) 

A(«  ) r 2 — 

J X 2w  f(w^) 


(3.6) 


Sine*  f(^}  has  period  28  and  is  symmetrical  about  0,  it 
is  symmetrical  about  v.  If  the  realizations  of  X^  are  smooth » 
on  the  average  f(^)  will  be  larger  near  the  origin  and 
smaller  away  from  it.  Thus,  the  interpretation  of  this 
filter  is  exactly  as  before. 


4 . Invariant  Testing. 

Watson  (1974)  studied  the  problem  of  detecting  a peak  of 
unknown  position  in  the  intensity  function.  Both  the  situ- 
ations, times  of  events  given  and  counts  in  equal  intervals 
given,  were  discussed.  The  first  case  was  reduced  to  testing 
for  unifonaity  on  a circle.  To  make  the  correspondence  to 
testing  probability  densities  for  uniformity  clearer,  the 
data  were  divided  by  T to  fit  on  a circle  of  unit  perimeter. 
If  the  times  are  then  x^.  ..x^,  the  locally  most  powerful 
invariant  tests  of  Beran  (1968)  have  the  form:  reject  when 
T^  is  large  where 


j-1 


(4.1) 


13. 


and  thtt  c are  the  Fourier  coefficleAts  of  g(x),  a density 

m 

with  period  unity.  The  family  of  alternatives  is 

(1  - c)  eg(x  P)  where  ^ is  any  number  in  (0,1)  and  t is 

small.  An  alternative  form  of  (4.1)  is 

11  2 

Tm  - - / ir{g(x.  - ♦)  - l)rd^ 

” M 0 * 

11  2 

- M / I - Eg(Xi  - ♦)  - ll^d«  (4.2) 

0 M * 


where 

H ^ I g(x.  - 4) 
1 ^ 


is  a kernel-type  estimator  of  the  density  of  x,  a test  sug- 
gested by  Hatson  (1967)  on  intuitive  grounds.  The  test  T^ 
is  invariant  with  respect  to  the  position  of  4 of  the 
alternative  density  g(x  - 4).  In  our  case, 

X(T(x  - 4)) 
g(x  - 4)  • — ^ 

/ X(t)dt) 

0 

where  X(*}  has  been  made  to  have  period  T.  If  only  the 
coiints  in  N equal  intervals 


j jti 

( - r ) 

II  H 


for  j ■ 0,  ...,  N-1  are  given,  the  natural  approximation  to 
(4.1)  and  (4.2)  is 


I 1^1  - 

m*l  “ M 


N-1  2wij 

I n.  exp  

0 J N 


(4.3) 


where 


14. 


Nd  - exp  -2xiin/N) 
2«iin 


n > 0 . 


(4.4) 


Then  (4.3)  was  shown  to  reduce  to 


2viiBj 


exp 


(4.5) 


because  of  aliasing.  Experiments  with  a special  case  of 
called  (Watson  (1974))  with  N « 20,  showed  that  the  null 
distributions  of  Tjj  and  T,,'  were  practically  the  same. 

Thus,  for  invariant  testing,  often  little  will  be  lost 
by  grouping  and  the  optimal  statistic  (then  (4.5))  is  easily 
computed  using  the  P.P.T.  In  the  notation  cf  this  paper, 

<4. 5)  is 


m«l 


(4.6) 


m*l 


(4.7) 


if  |X(w_)|V£X*  is  always  small.  This  is  so,  for  example,  in 
ni  w 

the  first  special  case  of  Section  2 — a constant  signal.  The 
test  in  the  second  case,  a short  very  strong  signal,  is 
based  on 


N-1  , 

I i»(%)i' 

in>l 

which  is  the  discrete  form  of  a test  suggested  in  Watson 
(1967)  for  this  situation. 

Thus,  the  grouped  form  of  invariant  testing  is  very 
close  to  the  grouped  estimation  problem  of  Section  2.  Both 
benefit  from  being  discussed  in  terms  of  finite  Fourier  trans- 
forms. If  one  ignored  the  test  theory  of  Beran  and  just  knew 
the  results  of  Section  2,  a natural  test  statistic,  to  test 
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1^  B constant,  X say,  would  ba 


H-1  ^ ,1  N-1  , 

Jo  “ iJ  I ' 3^*(Wj)r  (4.t) 


where 


■>1 

X*  (Wj)  " J ^ 


{NX  it  J - 
0 if  j f« 


But 


A(0)n(0)  - jj- 


U(0)| 


(0)1 


K 


Thus,  the  first  tons  lii  (4.8)  merely  tests  the  value  chosen 
X for  the  constant  X^.  The  remaining  terras  test  whether  X^ 
is  constant  so  one  would  use  only  them  for  the  teat.  But 
this  is  (4.7).  In  fact,  |X((i)^)  | VcX^  small,  all  n ^ 0, 
really  means  that  X^  does  not  vary  much  and  so  defines  a 
neighbourhood  of  X^  > constant.  Since  Beran-type  tests  are 
only  locally  most  powerful  invariant,  it  is  not  stirprising 
that  the  two  tests  coincide  in  this  case. 

Alternatives  raay  be  local,  intermediate  or  distant.  The 
last  concept  is  defined  in  Watson  (1974)  from  probability 
distributions  on  the  circle  and  corresponds  to  cases  like 
the  second  exas^le  at  the  end  of  Section  2.  The  optimal 
invariant  tests  are  based  on  supreme,  the  associated  distri- 
butions are  hard  to  find  and  Fourier  analysis  seems  to  be  of 
no  avail.  In  the  present  case  of  (2.15),  this  suggests  the 
intuitive  test— is  the  greatest  of  Oq,  n^,  ...,  too 

large  to  have  occurred  by  chance?  Since  the  n^'s  are  inde- 
pendent Poissons  with  the  same  unkno%m  mean  on  the  null 
hypothesis,  the  test  which  is  Independent  of  the  mean  uses 
the  distribution  of  the  largest  frequency  in  a multinos^al 
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. 

of  N equal  cells  when  the  total  frequency  £n^  is  kept  fixed. 

This  is  not  an  easy  problem.  It  has  been  studied  by  Dudley 
(1971).  In  general  then  distant  tests  may  be  expected  to 
have  the  form: 

max  X(t)  > C 

t»0,...,N-l 

and  to  be  difficult  to  use  because  of  distributional  troubles. 

5.  Spline  Estimation. 

In  Sections  2 and  3,  we  hav'e  looked  only  at  average  mean 
square  error  of  the  estimator  of  X^.  A method  which  has 
a Bayesian  type  of  motivation  arises  if  we  insist  also  that 
be  a smooth  function  of  t.  We  treat  here  the  situation 
of  Section  2.  Thus,  to  the  criterion  (2.2)  we  wish  to  add 

I . 

1 ' 

From  (2.7) 

A^X^  - - J X{«^)A^  exp  -iw^t 

1 ^ 2 
- - J X(uy  exp  -iw^t  (exp  iw^  - 1)  . 

li 

Cy  the  analogue  of  (2.8) 

I J |X(Wj)(exp  - 1)*|  . 

Hence,  our  criterion  will  now  bet  find  A(ti)j)  so  that 

i 

..J I 
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N 9 9 9 : 

E X |A(Mj)n(M^)  - XUj)r  + c^|A(Wj)n(u^)  (exp  - 1)^| 


- ninl  (5.1) 

2 2 ^ 

In  (5.1) « c is  a smoothing  parameter.  As  c X(t)  must 

become  flat. 

Evaluating  (5.1)  by  making  use  of  the  results  that  lead 
up  to  (2.13),  we  find  that  the  j'^th  term  in  (5.1)  is 

|A(»^) |^(ZX^  + |X(«^)|2)(1  + c^lexp  iw^  - 1|^) 

- (A(u^)  A(«j))  |X(Mj)  + lX(ti)^)|^ 


(JX^  + lX(Wj)|^)(l  ♦ - 1|^) 

|X(«.>j)|^ 

(JX^  ♦ |X(«^)|^)(1  ♦ - 1|^) 


A(w^) 


|X(w^)|^  - 


IXfWj)!* 


(IXi  ♦ |X(«.)|^)(1  + c^|e^“j  - 1|^) 


(5.2) 


where  the  last  term  in  (5.2)  is  positive.  Thus,  the 
optimal  filter  now  is  given  by 

|X(w.j)|^ 

A(w^)  3 . (5.3) 

(iX^  ♦ |X(«^)|2)(1  ♦ c^lexp  i»j  - 1|^) 

2 

Clearly,  the  result  (5.3)  reduces  to  (2.15)  when  c 0. 

2 

When  c A(iiij)'»0  unless  j « 0 so  that  the  aj^  become  equal 
and  the  estimator  becomes  constant  for  t ■ 1,...,  N as  pre* 
dieted  above. 

As  in  Section  2,  it  is  easy  to  calculate  the  time  filter 


coefficients  using  the  Fast  Fourier  Transfora. 
The  use  of 


? 2 
I 14  V 
1 ^ 

to  control  smoothness  is  common  but  had  tre  chosen 

I (4'’X.)" 

1 ^ 

the  optimal  filter  would  have  been 

A(w^)  - - ■ ■ * (5.4) 

(ZX^  + |l(«j)|2)(l  + c^lcxp  iw^  - 1|2P) 

and  its  properties  for  p ^ 1 are  the  same. 

Splines  in  the  discrete  case  were  first  suggested  by 
Whittaker  (1923)  and  in  the  continuous  case  by  Schoenberg 
(1964).  To  obtain  a Bayesian  motivation  for  (5.1),  it  is 
usual  to  invent  a process  for  which  makes  the  likelihood 
function  depend  on  J(aPx^)^.  This  would  be  so  if  A^X^  for 
the  various  values  of  t were  independently  Gaussian  with  the 
same  variance.  In  continuous  time,  this  requires  X(t)  to  be 
an  Integrated  Brownian  motion. 

6.  Another  Model. 

Given  the  above  development,  it  seems  worthwhile  to  show 
how  the  Weiner  Filter  may  be  easily  derived.  This  means 
changing  the  model  from  the  N.H.P.P.  In  fact,  we  have  only 
assumed,  in  the  Section  2 derivation  of  the  optimal  linear 
filter,  that 

E(n^)  ■ X^,var(n^)  ■ X^  ,cov(n^  ,n^,)  -0,  t t'  . 


(6.1) 


Instead,  suppose  that 
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■ (n^)  « # cov(n^  * (^*2) 

Thus,  the  cotmts  in  different  intervals  may  be  correlated 
but  this  correlation  should  depend  only  upon  their  spacing. 

In  this  sense,  the  assumptions  are  weaker.  However, 
p(0)  « var(n^)  must  now  be  constant  and  not  depend  upon  X^. 
Then  (2.13)  becomes 

E|n(u^)l^  - llEln^n^t)  ®xp  iUj(t-t') 

• |X(w^)|^  ®XP  i«j(t-t').  (6.3) 

Given  the  fact  that  we  have  extended  the  data  to  make  n^  of 
period  N,  p(T)  in  (6.2)  must  have  period  N.  Thus,  p(~l)  ■ 
p(N-l),  p(-2)  « p(N-2),  etc.,  so  that  the  coveuriance  matrix 
C whose  element  in  the  t>th  row,  t'-th  column  is  p(t-t')  is 
an  NKN  circw.lant.  The  eigenvectors  of  such  matrices  are 
(exp  iw^  • 0,  exp  iw^  • 1,  ...,  exp  iUj(N-l)],  j *»  0,  ..., 
K-1.  Denoting  the  corresponding  eigenvalues  by  f(Uj),  (6.3) 
becomes 

E|n(w^)|^  - |X(Wj)|^  + f(«j)  . (6.4) 

In  this  case,  the  analogue  of  (2.15)  is  seen  to  be 

|A(a».)l2  . 

A(fai.)  » 5J- . (6.5) 

J |X(w^) r + f (Wj) 

This  is  nothing  but  the  classical  result  of  Wiener  for 
filtering  a stationary  process* -except  that  we  have  derived 
it  on  the  circle. 

In  order  to  use  (6.5)  in  practice  to  derive  a good 
smoothing  formula,  one  needs  to  make  a good  guess  at  the 
’’noise”  spectrum  as  well  as  the  signal  spectrum 

{X(u^)|  , which  was  all  that  was  needed  in  Section  2.  This 
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is  the  price  of  the  weaker  assumption. 

7.  An  Even  More  Applied  Approach. 

A data  analyst,  knowing  that  E(n^)  « Var  (n^)  ■ 
would  probably  consider  y^  ■ 2i^n^  which  will  have  approxi- 
mately unit  variance  zuid  mean  2»0r^,  provided  A^  is  not  small. 

In  passing  a curve  through  the  points  (t,y^),  the  analyst 
will  not  be  distracted  by  variance  changes.  Further  y^  will 
be  approximately  Gaussian  so  it  makes  sense  to  pass  the 
curve  through  the  "center"  of  the  band  of  points.  Finally, 
any  smoothing  method  could  be  considered,  without  regard  to 

A 

positivity,  since,  given  the  smoothed  values,  y^  say,  a 
positive  smoothed  estimate  of  A^  is  available  from 


If  other  transformations  such  as  2/n^  + 3/8  or  /n^  + /n^+1 
are  used,  the  formulae  replacing  (7.1)  are  obtained  similarly. 

To  counter  criticisms,  this  analyst  might  object  to 
(1.2)  as  a criterion  on  the  grounds  that  the  derivations 

A 

X(t)  - A(t)  are  weighted  equally  despite  the  fact  that 
Var(A(t)  - A(t))  • 0(X(t)).  The  analyst  might  also  criticize 
the  notion  of  optimal  smoothing  by  saying  that,  since  the 
true  X(t)  is  not  known,  one  must  try  various  smoothers  on  the 
data,  possibly  smoothing  less  in  some  stretches  than  in 
others  to  retain  resolution. 

It  is,  of  course,  possible  to  formalize  smoothing  with 
y^,  the  square  roots  of  the  counts.  A ’standard  procedure 
would  be  to  use  a spline  fit,  e.g.,  find  » 2^^  to 
minimize 

• 

The  first  term  is  more  justifiable  now  since  the  y^  have 
approximately  the  same  variance  about  u^.  The  second  tern 
insists  on  smoothness  of  the  fitted  The  K is  a tuning 

parameter,  K > 0 implying  no  smoothing  at  all,  l.e., 
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linear  estimator  minimizing  the  expected  average  mean  square 
error  Is  derived.  This  Is  generalized  by  adding  to  the  criterion 
a multiple  of  the  expected  average  squared  p^  difference. 

This  leads  to  splines. 
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